Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

نویسندگان

  • Saba Q. Yahyaa
  • Madalina M. Drugan
  • Bernard Manderick
چکیده

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for contextual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.

متن کامل

An Empirical Analysis of Bandit Convex Optimization Algorithms

We perform an empirical analysis of bandit convex optimization (BCO) algorithms. We motivate and introduce multi-armed bandits, and explore the scenario where the player faces an adversary that assigns different losses. In particular, we describe adversaries that assign linear losses as well as general convex losses. We then implement various BCO algorithms in the unconstrained setting and nume...

متن کامل

Finite-time analysis for the knowledge-gradient policy and a new testing environment for optimal learning

We consider two learning scenarios, the offline Bayesian ranking and selection problem with independent normal rewards and the online multi-armed bandit problem. We derive the first finite-time bound of the knowledge-gradient policy for ranking and selection problems under the assumption that the value of information is submodular. We demonstrate submodularity for the two-alternative case and p...

متن کامل

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent’s goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a...

متن کامل

Multi-armed Bandit Algorithms and Empirical Evaluation

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014